AITopics | Mombasa

Collaborating Authors

Mombasa

Feature-Wise Mixing for Mitigating Contextual Bias in Predictive Supervised Learning

arXiv.org Machine LearningJul-1-2025

Bias in predictive machine learning (ML) models is a fundamental challenge due to the skewed or unfair outcomes produced by biased models. Existing mitigation strategies rely on either post-hoc corrections or rigid constraints. However, emerging research claims that these techniques can limit scalability and reduce generalizability. To address this, this paper introduces a feature-wise mixing framework to mitigate contextual bias. This was done by redistributing feature representations across multiple contextual datasets. To assess feature-wise mixing's effectiveness, four ML classifiers were trained using cross-validation and evaluated with bias-sensitive loss functions, including disparity metrics and mean squared error (MSE), which served as a standard measure of predictive performance. The proposed method achieved an average bias reduction of 43.35% and a statistically significant decrease in MSE across all classifiers trained on mixed datasets. Additionally, benchmarking against established bias mitigation techniques found that feature-wise mixing consistently outperformed SMOTE oversampling and demonstrated competitive effectiveness without requiring explicit bias attribute identification. Feature-wise mixing efficiently avoids the computational overhead typically associated with fairness-aware learning algorithms. Future work could explore applying feature-wise mixing for real-world fields where accurate predictions are necessary.

artificial intelligence, dataset, machine learning, (14 more...)

arXiv.org Machine Learning

2506.23033

Country:

Asia > India > West Bengal > Kolkata (0.05)
Africa > Kenya > Mombasa County > Mombasa (0.05)
Asia > Sri Lanka > Western Province > Colombo > Colombo (0.05)
(3 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.66)

Industry: Banking & Finance (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Validation of Conformal Prediction in Cervical Atypia Classification

Hagos, Misgina Tsighe, Suutala, Antti, Bychkov, Dmitrii, Kücükel, Hakan, von Bahr, Joar, Poceviciute, Milda, Lundin, Johan, Linder, Nina, Lundström, Claes

arXiv.org Artificial IntelligenceMay-15-2025

Deep learning based cervical cancer classification can potentially increase access to screening in low-resource regions. However, deep learning models are often overconfident and do not reliably reflect diagnostic uncertainty. Moreover, they are typically optimized to generate maximum-likelihood predictions, which fail to convey uncertainty or ambiguity in their results. Such challenges can be addressed using conformal prediction, a model-agnostic framework for generating prediction sets that contain likely classes for trained deep-learning models. The size of these prediction sets indicates model uncertainty, contracting as model confidence increases. However, existing conformal prediction evaluation primarily focuses on whether the prediction set includes or covers the true class, often overlooking the presence of extraneous classes. We argue that prediction sets should be truthful and valuable to end users, ensuring that the listed likely classes align with human expectations rather than being overly relaxed and including false positives or unlikely classes. In this study, we comprehensively validate conformal prediction sets using expert annotation sets collected from multiple annotators. We evaluate three conformal prediction approaches applied to three deep-learning models trained for cervical atypia classification. Our expert annotation-based analysis reveals that conventional coverage-based evaluations overestimate performance and that current conformal prediction methods often produce prediction sets that are not well aligned with human labels. Additionally, we explore the capabilities of the conformal prediction methods in identifying ambiguous and out-of-distribution data.

artificial intelligence, machine learning, prediction, (18 more...)

arXiv.org Artificial Intelligence

2505.08845

Country:

Europe > Finland > Uusimaa > Helsinki (0.05)
Europe > Sweden > Uppsala County > Uppsala (0.04)
North America > United States > Washington > King County > Redmond (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

In pictures: Prayers and reflection mark Eid celebrations around the world

BBC NewsMar-30-2025, 13:08:58 GMT

Muslims around the world have begun celebrating Eid al-Fitr, one of the biggest celebrations in the Islamic calendar. Eid al-Fitr - which means "festival of the breaking of the fast" - is celebrated at the end of Ramadan, a month of fasting for many adults, as well as spiritual reflection and prayer.ReutersHere in Moscow, worshippers are seen preparing for prayer.ReutersHundreds took part in prayers at Tononoka grounds, in Mombasa, KenyaGetty ImagesPrayers were also observed at a stadium in Port Sudan in the east of the countryGetty ImagesLittle children joined adults at the Moskee Essalam in Rotterdam, NetherlandsGetty ImagesGifts are handed out to Muslim children in Lviv, Ukraine, as Russia's war on the country continuesReuters Palestinians in Jabaliya in the northern Gaza Strip pray amidst the rubble of a mosque destroyed in the current war between Israel and HamasGetty ImagesFamilies gather at al-Aqsa mosque in Jerusalem - the third holiest site in IslamReutersA boy yawns during prayers at a stadium in QatarEPAMuslims greet each-other at Martim Moniz Square in Lisbon, PortugalGetty ImagesWomen worshippers gather in Burgess Park, London, for an outdoor prayerEPAThere were also worshippers gathered outside Plebiscito Square in Naples, ItalyReutersSome women took pictures after attending prayers at the Hagia Sophia Grand Mosque in Istanbul, TurkeyGetty ImagesAfghan refugees pray at a mosque on the outskirts of Peshawar, PakistanMiddle EastEuropeEid al-FitrReligionIslamRelated'I was afraid for my life': At the scene of the attack on Palestinian Oscar winner 5 days agoMiddle EastMore8 hrs ago'In Bradford, families spend thousands on new clothes for Eid' Muslims spend large amounts in Bradford's supermarkets, clothes shops and other services before Eid.8 hrs agoEngland1 day ago The tourist has received an award from the city's mayor after restraining a man during a stabbing.1 day agoEurope1 day ago Another 21 people are injured, as a restaurant and several buildings are set ablaze in the city, local officials say.1 day agoWorld1 day ago Town's successful Ramadan lights project expanded A Scunthorpe community group says it has seen an "amazing" response to its lights display.1 day agoLincolnshire1 day ago Bishop says school that changed Easter events'valued' The BBC is not responsible for the content of external sites.

artificial intelligence, celebration, eid celebration, (9 more...)

BBC News

Country:

Europe > United Kingdom > England > Lincolnshire > Scunthorpe (0.26)
Europe > Ukraine > Lviv Oblast > Lviv (0.26)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.26)
(23 more...)

Industry:

Consumer Products & Services (0.72)
Government > Regional Government (0.32)

Technology: Information Technology > Artificial Intelligence (0.32)

Add feedback

Bridging Gaps in Natural Language Processing for Yor\`ub\'a: A Systematic Review of a Decade of Progress and Prospects

Jimoh, Toheeb A., De Wille, Tabea, Nikolov, Nikola S.

arXiv.org Artificial IntelligenceFeb-24-2025

Natural Language Processing (NLP) is becoming a dominant subset of artificial intelligence as the need to help machines understand human language looks indispensable. Several NLP applications are ubiquitous, partly due to the myriads of datasets being churned out daily through mediums like social networking sites. However, the growing development has not been evident in most African languages due to the persisting resource limitation, among other issues. Yor\`ub\'a language, a tonal and morphologically rich African language, suffers a similar fate, resulting in limited NLP usage. To encourage further research towards improving this situation, this systematic literature review aims to comprehensively analyse studies addressing NLP development for Yor\`ub\'a, identifying challenges, resources, techniques, and applications. A well-defined search string from a structured protocol was employed to search, select, and analyse 105 primary studies between 2014 and 2024 from reputable databases. The review highlights the scarcity of annotated corpora, limited availability of pre-trained language models, and linguistic challenges like tonal complexity and diacritic dependency as significant obstacles. It also revealed the prominent techniques, including rule-based methods, among others. The findings reveal a growing body of multilingual and monolingual resources, even though the field is constrained by socio-cultural factors such as code-switching and desertion of language for digital usage. This review synthesises existing research, providing a foundation for advancing NLP for Yor\`ub\'a and in African languages generally. It aims to guide future research by identifying gaps and opportunities, thereby contributing to the broader inclusion of Yor\`ub\'a and other under-resourced African languages in global NLP advancements.

african language, dataset, yor, (14 more...)

arXiv.org Artificial Intelligence

2502.17364

Country:

North America > United States (0.14)
Africa > Niger (0.05)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
(37 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.68)

Industry:

Information Technology (0.46)
Education (0.46)
Media (0.45)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
(5 more...)

Add feedback

RideKE: Leveraging Low-Resource, User-Generated Twitter Content for Sentiment and Emotion Detection in Kenyan Code-Switched Dataset

Etori, Naome A., Gini, Maria L.

arXiv.org Artificial IntelligenceFeb-10-2025

Social media has become a crucial open-access platform for individuals to express opinions and share experiences. However, leveraging low-resource language data from Twitter is challenging due to scarce, poor-quality content and the major variations in language use, such as slang and code-switching. Identifying tweets in these languages can be difficult as Twitter primarily supports high-resource languages. We analyze Kenyan code-switched data and evaluate four state-of-the-art (SOTA) transformer-based pretrained models for sentiment and emotion classification, using supervised and semi-supervised methods. We detail the methodology behind data collection and annotation, and the challenges encountered during the data curation phase. Our results show that XLM-R outperforms other models; for sentiment analysis, XLM-R supervised model achieves the highest accuracy (69.2\%) and F1 score (66.1\%), XLM-R semi-supervised (67.2\% accuracy, 64.1\% F1 score). In emotion analysis, DistilBERT supervised leads in accuracy (59.8\%) and F1 score (31\%), mBERT semi-supervised (accuracy (59\% and F1 score 26.5\%). AfriBERTa models show the lowest accuracy and F1 scores. All models tend to predict neutral sentiment, with Afri-BERT showing the highest bias and unique sensitivity to empathy emotion. https://github.com/NEtori21/Ride_hailing

information retrieval, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2024.wassa-1.19

2502.0618

Country:

Africa > Kenya > Nairobi City County > Nairobi (0.07)
Africa > Kenya > Nairobi Province (0.06)
Africa > Kenya > Mombasa County > Mombasa (0.05)
(18 more...)

Genre: Research Report > New Finding (0.54)

Industry:

Transportation > Passenger (1.00)
Information Technology (1.00)
Transportation > Ground > Road (0.93)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
(2 more...)

Add feedback

Generative Style Transfer for MRI Image Segmentation: A Case of Glioma Segmentation in Sub-Saharan Africa

Chepchirchir, Rancy, Sunday, Jill, Confidence, Raymond, Zhang, Dong, Chaudhry, Talha, Anazodo, Udunna C., Muchungi, Kendi, Zou, Yujing

arXiv.org Artificial IntelligenceJan-7-2025

In Sub-Saharan Africa (SSA), the utilization of lower-quality Magnetic Resonance Imaging (MRI) technology raises questions about the applicability of machine learning (ML) methods for clinical tasks. This study aims to provide a robust deep learning-based brain tumor segmentation (BraTS) method tailored for the SSA population using a threefold approach. Firstly, the impact of domain shift from the SSA training data on model efficacy was examined, revealing no significant effect. Secondly, a comparative analysis of 3D and 2D full-resolution models using the nnU-Net framework indicates similar performance of both the models trained for 300 epochs achieving a five-fold cross-validation score of 0.93. Lastly, addressing the performance gap observed in SSA validation as opposed to the relatively larger BraTS glioma (GLI) validation set, two strategies are proposed: fine-tuning SSA cases using the GLI+SSA best-pretrained 2D fullres model at 300 epochs, and introducing a novel neural style transfer-based data augmentation technique for the SSA cases. This investigation underscores the potential of enhancing brain tumor prediction within SSA's unique healthcare landscape.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2501.04734

Country:

Africa > Sub-Saharan Africa (0.61)
North America > Canada > Quebec > Montreal (0.15)
Africa > Kenya > Nairobi City County > Nairobi (0.05)
(9 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Uchaguzi-2022: A Dataset of Citizen Reports on the 2022 Kenyan Election

Mondini, Roberto, Kotonya, Neema, Logan, Robert L. IV, Olson, Elizabeth M, Lungati, Angela Oduor, Odongo, Daniel Duke, Ombasa, Tim, Lamba, Hemank, Cahill, Aoife, Tetreault, Joel R., Jaimes, Alejandro

arXiv.org Artificial IntelligenceDec-17-2024

Online reporting platforms have enabled citizens around the world to collectively share their opinions and report in real time on events impacting their local communities. Systematically organizing (e.g., categorizing by attributes) and geotagging large amounts of crowdsourced information is crucial to ensuring that accurate and meaningful insights can be drawn from this data and used by policy makers to bring about positive change. These tasks, however, typically require extensive manual annotation efforts. In this paper we present Uchaguzi-2022, a dataset of 14k categorized and geotagged citizen reports related to the 2022 Kenyan General Election containing mentions of election-related issues such as official misconduct, vote count irregularities, and acts of violence. We use this dataset to investigate whether language models can assist in scalably categorizing and geotagging reports, thus highlighting its potential application in the AI for Social Good space.

computational linguistic, dataset, proceedings, (14 more...)

arXiv.org Artificial Intelligence

2412.13098

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Africa > Kenya > Bomet County > Bomet (0.05)
(35 more...)

Genre: Research Report (0.50)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Voting & Elections (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Model Equality Testing: Which Model Is This API Serving?

Gao, Irena, Liang, Percy, Guestrin, Carlos

arXiv.org Artificial IntelligenceOct-26-2024

Users often interact with large language models through black-box inference APIs, both for closed- and open-weight models (e.g., Llama models are popularly accessed via Amazon Bedrock and Azure AI Studio). In order to cut costs or add functionality, API providers may quantize, watermark, or finetune the underlying model, changing the output distribution -- often without notifying users. We formalize detecting such distortions as Model Equality Testing, a two-sample testing problem, where the user collects samples from the API and a reference distribution and conducts a statistical test to see if the two distributions are the same. We find that tests based on the Maximum Mean Discrepancy between distributions are powerful for this task: a test built on a simple string kernel achieves a median of 77.4% power against a range of distortions, using an average of just 10 samples per prompt. We then apply this test to commercial inference APIs for four Llama models, finding that 11 out of 31 endpoints serve different distributions than reference weights released by Meta.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.20247

Country:

Africa > Middle East > Libya (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Greater London > London > City of London (0.04)
(17 more...)

Genre: Research Report (1.00)

Industry:

Transportation (1.00)
Media > Film (1.00)
Leisure & Entertainment (1.00)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Elaborative Subtopic Query Reformulation for Broad and Indirect Queries in Travel Destination Recommendation

Wen, Qianfeng, Liu, Yifan, Zhang, Joshua, Saad, George, Korikov, Anton, Sambale, Yury, Sanner, Scott

arXiv.org Artificial IntelligenceOct-2-2024

In Query-driven Travel Recommender Systems (RSs), it is crucial to understand the user intent behind challenging natural language (NL) destination queries such as the broadly worded "youth-friendly activities" or the indirect description "a high school graduation trip". Such queries are challenging due to the wide scope and subtlety of potential user intents that confound the ability of retrieval methods to infer relevant destinations from available textual descriptions such as WikiVoyage. While query reformulation (QR) has proven effective in enhancing retrieval by addressing user intent, existing QR methods tend to focus only on expanding the range of potentially matching query subtopics (breadth) or elaborating on the potential meaning of a query (depth), but not both. In this paper, we introduce Elaborative Subtopic Query Reformulation (EQR), a large language model-based QR method that combines both breadth and depth by generating potential query subtopics with information-rich elaborations. We also release TravelDest, a novel dataset for query-driven travel destination RSs. Experiments on TravelDest show that EQR achieves significant improvements in recall and precision over existing state-of-the-art QR methods.

qr method, query, retrieval, (15 more...)

arXiv.org Artificial Intelligence

2410.01598

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > Canada > Quebec > Montreal (0.14)
(51 more...)

Genre: Research Report (0.82)

Industry: Consumer Products & Services > Travel (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.68)

Add feedback

FLEXTAF: Enhancing Table Reasoning with Flexible Tabular Formats

Zhang, Xuanliang, Wang, Dingzirui, Dou, Longxu, Wang, Baoxin, Wu, Dayong, Zhu, Qingfu, Che, Wanxiang

arXiv.org Artificial IntelligenceAug-27-2024

The table reasoning task aims to answer the question according to the given table. Currently, using Large Language Models (LLMs) is the predominant method for table reasoning. Most existing methods employ a fixed tabular format to represent the table, which could limit the performance. Given that each instance requires different capabilities and models possess varying abilities, we assert that different instances and models suit different tabular formats. We prove the aforementioned claim through quantitative analysis of experimental results, where different instances and models achieve different performances using various tabular formats. Building on this discussion, we propose FLEXTAF-Single and FLEXTAF-Vote to enhance table reasoning performance by employing flexible tabular formats. Specifically, (i) FLEXTAF-Single trains a classifier to predict the most suitable tabular format based on the instance and the LLM. (ii) FLEXTAF-Vote integrates the results across different formats. Our experiments on WikiTableQuestions and TabFact reveal significant improvements, with average gains of 2.3% and 4.8% compared to the best performance achieved using a fixed tabular format with greedy decoding and self-consistency decoding, thereby validating the effectiveness of our methods.

demonstration, tabular format, utterance, (14 more...)

arXiv.org Artificial Intelligence

2408.08841

Country:

South America > Brazil (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Hungary > Hajdú-Bihar County > Debrecen (0.04)
(20 more...)

Genre: Research Report (0.51)

Industry: Leisure & Entertainment > Sports (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback